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Twenty-Ninth Annual Conference on Educational 
Measurements 


COMMENT 


Two panel discussions, one on current theories and practices of 
educational measurement and one on the future of educational meas- 
urements, were held at the Twenty-ninth Measurements Conference. 
School superintendents from various parts of the state and faculty 
members in the field of education from the different state institutions 
of higher education in Indiana took part in these discussions. Because 
of the nature of the program, however, no report of this part of the 
conference is included in the printed proceedings. 
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FACT AND FANCY IN EDUCATIONAL 
MEASUREMENTS 


S. A. Courtis, Professor of Education, University of Michigan 


THE movement for the use of measurement in education is almost 
one hundred years old, if its beginnings are taken as the first American 
survey, the measurement of the Boston schools in 1845.1 It is nearly 
fifty years since Rice published the first of his historic comparative 
studies of American schools. It is thirty-two years since Thorndike 
issued his handwriting scale, the first calibrated instrument for the 
measurement of an educational product. Whatever event is chosen 
as its birthday, the measurement movement is old enough to make 
appraisal profitable, and today, as we stand at the close of one his- 
torical era and the beginning of a new order, it is quite suitable that 
an old man who has known Rice personally and has lived through a 
generation of measurement effort should share his experiences and 
conclusions with young men and women upon whose shoulders will 
rest the responsibility for the developments of the future. 

From my point of view, there have been five distinct trends or 
emphases in the history of measurement, the chief leaders of which 
are listed in Table I. The first period, psychological testing, began 
about 1875, when psychology went experimental and turned from 
speculative description to quantitative measurement. The general ideas 
of tests and measurements were developed during this period. Ex- 
citement ran high. Psychologists thought that at last human behav- 
ior was to be brought under scientific control. Individual differences 
were discovered. But, alas and alack, the individual differences would 
not correlate with each other, and one could not use a knowledge of 
them to predict behavior in new situations. So the movement grad- 
ually died away. However, Cattell brought it to this country in about 
1885 and developed what were the forerunners of our modern intelli- 
gence tests. Also he trained Thorndike and others who later were 
to make effective use of some of his ideas. 

About this time, also, another person trained abroad, J. M. Rice, 
conceived the idea of presenting a given examination under standard 
conditions to different school systems, and another forward step in 
scientific progress was taken. In this second phase the emphasis was 
upon the use of controlled conditions of testing. Not much came of 
Rice’s work immediately, but later it was to bear much fruit. 

The study of individual differences, which as a movement had 
almost subsided, suddenly flowered into the intelligence tests of Binet 
and Simon, the fruit of which was the concept of mental age, in my 


1 Caldwell, O. W., and Courtis, S. A. Then and Now in Education, World Book 
Co., New York, 1924. 400 pp. 


2 Rice, Joseph M. Scientific Management in Education, Publishers Printing Co., 
New York, 1913. 282 pp. 


* Thorndike, E. L. “Handwriting,” Teachers College Record 11:1-41, March, 1910. 
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TABLE I. MILESTONES OF PROGRESS 


Date | Educator | Event 

1874 Ga'ton, Wundt|Founding of psychological laboratories 

1885 Cattell Development of mental tests; study of individual differences 

1897 Rice Use of examinations under controlled conditions 

1905 Binet Individual intelligence tests; mental age 

1908 Stone Standardization of the examination 

1909 Courtis Construction of rate tests in arithmetic; development of units 
based on objective criteria 

1910 Thorndike Calibration of handwriting scale on basis of equal difference 


theorem; development of units on basis of judgments of 
relative merit 


1912 Ayres Handwriting; development of units on basis of position in 
a_ distribution 

1913 Buckingham |Spelling; determination of relative difficulty in variability units 

1914 Trabue Completion test; variability units based on scores 

1916 Terman Standardization of individual intelligence testing 

1917 Otis Group intelligence tests 

1921 McCall Standardization of group from which distributions for scaling 
are derived 

1922 Baldwin Study of physical growth from birth to maturity 

1923 Downey Diagnosis through change of conditions; differential testing 

1926 Courtis Discovery that Gompertz law of growth can be used in 
educational measurements 

1927 Thurstone Improvement of scaling procedure by correction for increasing 
dispersion 

1935 Frank Importance of longitudinal studies of child development 


judgment the greatest single contribution yet made in this field. 
Today the age concept is widely used. We have reading age, height 
age, physiological age, moral age, etc., and the concept is still being 
expanded and refined. Binet’s tests, in about 1905, mark the begin- 
ning of intelligence testing, the third of my five phases. 

While intelligence testing was spreading, the fourth phase of 
development was also getting under way. Stone (1908), under Thorn- 
dike’s guidance, utilized the idea of Rice’s controlled examination, but 
added to it the careful standardization of the questions in the ex- 
amination. That led me in 1909 to devise rate tests composed of 
material made uniform in terms of objective criteria. Rate tests 
measure skill at a given level of difficulty, but yield extremely 
variable results. Why this happened was not known at that time, 
and Thorndike threw the weight of his influence against rate tests. 
On this very platform he ridiculed them out of court for many per- 
sons. It is probable that ultimately their unique and intrinsic worth 
will yet be recognized. Historically, however, rate tests represent an 
avenue of development largely unexplored, but my efforts to promote 
their use at least resulted in the widespread use of educational tests. 

In the meantime Thorndike, stimulated by Rice and Cattell, at- 
tacked the problem of calibration of measuring instruments and in 
1910 published the first true scale for the measurement of an educa- 
tional product. We know now that his theory was imperfect and his 
technique inadequate, but no defects can ever lessen the value of 
Thorndike’s achievement. Just as Galileo’s first crude air thermometer 
led to more precise instruments and measurements in the field of 
thermodynamics, so Thorndike’s handwriting scale has exerted great 


10 BULLETIN OF THE SCHOOL OF EDUCATION 


educational influence, familiarizing thousands with the idea of units 
and scales, and in turn paving the way for more adequate methods. 

A few years later, another contributor, Ayres, introduced a dif- 
ferent scaling procedure. In his handwriting scale, units were 
determined by position in a distribution, the sigma scaling technique 
which yields variability units. This methcd seized popular fancy. 
Buckingham, Trabue, McCall, Thurstone, and others quickly extended 
the technique to include assumed normal distributions, scores in 
tests, and standardized groups. From 1915 on, a flood of educational 
tests of all sorts and kinds was poured into an eager market. New 
tests are still being issued. 

In 1916 Terman published the Stanford Revision of the Binet 
Tests, and individual intelligence testing spread widely over the 
country. When the United States entered the First World War, Otis 
had a group intelligence test about ready for publication, and its 
use by the Army further popularized intelligence testing. Both edu- 
cational and intelligence tests were used by the millions. Surveyors, 
administrators, and supervisors took up the new fad avidly, but 
teachers, who are closer to the children, soon found that the tests 
were of little help. The child who makes a high score in an intelli- 
gence test is not necessarily a genius, nor is he who makes a low 
score necessarily a dummy. The child who makes a high score in an 
arithmetic or reading test is often not the one the teacher regards 
as her most effective student in these fields, nor are low scoring 
children always failures. Teachers, as a class, have been eagerly 
active in trying out each new instrument of measurement, but after 
a few years they generally become apathetic in the use of these 
instruments because of their ineffectiveness. So, when the depression 
came along and money was scarce, testing programs were cut or 
eliminated. Today many school systems still continue to give and 
score tests because it is an established custom, but it is yet to be proved 
that the general use of tests and measurement adds one whit to the 
efficiency of teaching. Today the constancy of the I1.Q. is called 
in question, educational conventions devote their sessions to the 
airing of subjective opinions, and the movement for the scientific 
study of educational problems is in an eclipse. This conference is 
one of the few remaining conferences devoted exclusively to education- 
al measurement, and a comparison of the addresses and speakers through 
the twenty-nine years of its existence will, I fancy, substantiate some 
of the ups and downs of the measurement movement. 

However, science is not on trial, and the present eclipse of edu- 
cational measurement is not a total eclipse. Already the signs of 
a new development are increasing. A new emphasis, the longitudinal 
study of child development, is making itself felt. The beginnings 
of this new emphasis run far back into the past, but for present 
purposes I am going to take the publication of Baldwin, “The Phys- 
ical Growth of Children from Infancy to Childhood,‘ as its starting 
point. The Iowa Institute for the Study of Child Welfare was the 


*Baldwin, Bird T. “The Physical Growth of Children from Infancy to Child- 
hood.”” University of Iowa Studies in Child Welfare, Vol. 1, No. 1, University of 
Iowa, Iowa City, 1922, pp. 1-411. 
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first so far as I know to begin accumulating records of repeated 
measurements of individual children and, while Baldwin never com- 
pletely emancipated himself from mass statistics, he did recognize 
the potential values of individual growth curves. The Harvard Growth 
Study was begun at about this same time (1922), and today there are 
quite a few centers at which the records of individuals in successive 
measurements are being preserved and studied. 

My own search for the causes of variation in test scores led 
me to a study of growth curves, and, on my way home from the 
Educational Measurements Conference at Indiana University in 1926, 
stimulated by my own address at this conference, I made the fun- 
damental discovery that the Gompertz formula, developed in 1825 in 
England for life insurance statistics, applies universally to all growth 
curves when growth takes place under controlled conditions. This 
discovery opened the door for adequate and consistent statistical 
treatment of growth data, and eventually I believe it will revivify 
the whole measurement movement. In the meantime, studies of child 
development are increasing, and the emphasis by Frank in 1935, on 
the need for the longitudinal approach, has stimulated a whole crop 
of new studies, new discoveries, and new interpretations. 

So much for historical review. The question still remains, “What 
fruits in fact and in fancy have resulted from two generations of 
experience with psychological measurements?” Well, it must be con- 
fessed that psychological science is still largely a pseudo science; 
educational measurement, as conventionally practiced, is largely fancy. 
It is a fact that children differ, but our studies of child development 
are so inadequate that the best of our interpretations must be classed 
as pretty largely fanciful. It is a fact that we have many instru- 
ments of measurement, but our assumptions that they measure in- 
telligence, reading, personality, etc. can easily be proved to be gross 
superstitions. It is a fact that we have a number of scaling tech- 
niques, but it can readily be shown that these techniques are based 
on unwarranted assumptions and yield at best only units of perform- 
ance, not units of ability. It is a fact that our libraries are filled 
with countless records of experimentation and statistical analyses, 
having all the form and trappings of scientific procedure, but it is 
not difficult tc show that all this fine appearance is appearance only, 
that all our experimentation likewise rests on untenable foundations 
and really does not yield enduring truth. Please do not misunderstand 
me. The measurement movement has been of great benefit, but it 
has also done much harm. We have made progress and we should 
rejoice in that progress, just as a fond mother rejoices when her baby 
begins to creep, in spite of the fact that it has not even thought 
about walking. 

Educational measurement is still on all fours. All the experi- 
mental results must be reworked on a new basis. New interpretations 
lie ahead. The underbrush has been cleared away, the road to progress 
plainly delineated. The pioneers of my generation have at least un- 
covered their own mistakes. It is for you youngsters to build a new 
and more enduring structure on the foundations we have laid. 

As a measurement man, I, of course, do not expect you to take 
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my word for these dogmatic statements. Instead, I shall ask you to 
consider the evidence on which they are based. A year or two ago 
in a certain city a series of reading tests were given in six 5A 
grades, and the scores of the children were converted to grade levels 
(Table II). 


TABLE II. SCORES OF MORE THAN 200 5A BOYS AND GIRLS IN STANDARD 
TESTS IN READING 


Number of children making each grade-level score 


| | 
Grade | Detroit | Detroit | Stanford | Thorndike- Gates Test 


levels | Compre- | Vocabulary Achieve- McCall l 
| hension | Test 6 | ment Test | Test | Type A Type C 
| Test3 | 
| 
| | 
9 and above 2* 8* 5* 13* 16* 
4 3 0 3 7 
4 7 5 5 5 
TR scvccecvcces 13* 8 5 7 15 14 
26 9 21 10 ll 22 
GE. veconccascee 8 10 22 51 21 24 
32 27 42 23 29 17 
| | | 
| 
GA cewnscveccos | 32 | 34 | 38 | 31 35 22 
| | | 
| | 
37 51 43 48 47 25 
GR cccccosccvece 36 | 46 26 25 22 16 
21 14 21 17 20 28 
| 16 | 17 7 14* 16 20 
| 19¢ | 11 ba 5* | 26* 
2A and below... | | an 
| | | 
| | | 
Total children } 
tested ........ 2400 288 243 236 242 242 
| 
| 


* These scores mark the limits of the available scores or norms for the given 
test. — grades so marked should be read “or above’ or “or below” as the case 
may be. 


What do you think when you read such a table? Do you note 
the wide range of scores in each grade and judge the teaching bad? 

When these results are taken at their face value and interpreted 
in the way in which tests are still frequently interpreted today, they 
indicate that our teaching of reading is hopelessly inefficient. 

In 1912 in the New York Survey Report of Tests in Arithmetic, 
I wrote, on the basis of similar data:5 


It [the table] points out plainly facts of the largest sig- 
nificance both for education generally and for the local situa- 
tion. It is the distributions of the individual scores that are 
startlingly—yes, sensationally significant. The real meaning 
is that, so far as any individual child is concerned, to say 


5 Report of Committee on School Inquiry, New York City, 1912, Part II, Section D, 
pp. 50-2. 
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that he has completed the course in arithmetic in the public 
schools is to convey no information as to his ability in even 
the simplest work. He may be almost an absolute incompetent 
so far as practical work is concerned, or he may have acquired 
a degree of skill that would be adequate for any situation in 
which he is likely to find himself. As a whole, therefore, the 
table more than justifies the criticism of the efficiency of the 
Sager in arithmetic that has yet been made by the “man on 
the street.” 


In the light of present knowledge, how silly such statements 
seem. It is humiliating to have written such pronouncements. But 
in 1912 we thought that an efficient teacher was one who could, if 
she tried hard enough, make every child learn. Today we know that 
a child’s score in a reading test does not measure reading ability, 
but reading “achievement” or “performance,” and that such a score 
is determined by his age, his sex, his intelligence, his level of maturity, 
his pattern of development, his emotional temperament, his social 
background, his character and ideals, and many similar factors, of 
which teaching is but one. It is no more truthful to call a score in a 
so-called reading test a measure of reading than it is to call it a 
measure of age, or of intelligence, or of sex, or of any other one of 
the many factors involved. 

For instance, consider the results obtained when these identi- 
cal children are measured in chronological age, height, weight, num- 
ber of permanent teeth cut, and intelligence; and the scores are also 
expressed in grade levels (Table III).¢ 

By no stretch of the imagination can a teacher be held respon- 
sible for the growth of her children in height, weight, dentition, 
and intelligence. But when measurement of these nonacademic ele- 
ments shows variation similar to that in reading, the fact should at 
least cause one to question why. Out of such questioning has arisen 
the present interest in growth. 

The most constant and universal fact about children is that 
they are always growing. Their dimensions at the end of a semester 
are not the same as they were at the beginning. How much do you 


® From age-grade surveys in previous years the average age grade 
determined. Next, physical and mental measurements were turned TF ang ipivsial and 
mental ages, and the children allocated to grade levels in terms of such 
process is exactly comparable with that by which the educational ages in :e Table Il 
were obtained, but is less precise. 
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TABLE II. SCORES OF MORE THAN 200 5A BOYS AND GIRLS IN 
NON-ACADEMIC MEASUREMENTS 
| Number of children making each grade level score 
| | | | 
| | | | Intelligence Reading— 
| | | | tests } Stanford 
Grade | Chrono- | Number | Detroit | Kuhl- | Achieve- 
levels | logical Height | Weight | of teeth | Test mann- ment 
age cut Anderson Test** 
| | | Test 
9 and above..| 1* 7* 20* 10* 2 8* 
| 2 6 9 13 0 3 
error | 0 9 15 ox 16 0 7 
5 36 23 | 25 7 5 
. ‘éiteneeaan 6 | 25 | 19 | 25 33 | 24 21 
|} 1 | 44 | 2 | 55 42 34 22 
Ge ssaNacewes | 19 | 18 25 | 44 29 64 | 42 
| | | 
| | | | | 
oe taneesenes } 34 | 39 | 31 | 42 | 26 | 33 38 
| | 
| | | | | | | 
a «bieseensan | 135 | 22 | 23 | 23 | 29 39 43 
| 29 19 12 14 | 17 2 
7 4 15 7 5 21 
| | 2 il 9 7* 
2A and below. | 2 | | 
| | | 
| | | | 
Total children | | 
measured 248 238 | 238 233 226 243 
| 


*These scores mark the limits of the available scores or norms for the given test. 
The grades so marked should be read “or above’ or “or below,” as the case may be 


**From Table I 


know about mere physical development and its effect on scores? Let 
us consider the growth of an individual in height (Figure 1). 


Fig. 
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Infancy cycle 


‘ 
Adolescence 


Childhood 


1. 


— 


Years from birth 


The curve for the growth in height of an individual boy from birth to ma- 


turity. Measurements taken by the boy’s father, April 11, 1759 to November 11, 
ammon, “The First Seriatim Study of Human Growth,” 
10: 329-36, July-September, 


1776. (From R. E. 
American Journal 
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of Physical Anthropology 


1927) 
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Today many hundreds of such curves are available for study from 
the Harvard Growth Study and from similar growth studies in other 
centers. 

In general, careful scientific analysis of such records proves that: 

1. Individual growth is cyclic in character. 

2. The various cycles of growth occur at different times in 
different individuals. 

3. Each child is unique as to times and amounts of growth, 
and should be judged in terms of his own natural standards 
and not in terms of norms delivered from mass measurements. 

The pattern of growth for an individual child differs markedly 
from the pattern of growth derived from mass averages. This is 
hard to believe, but it shows up clearly in Figures 2, 3, and 4. 


Height 
in 
inches 
— A. Girl, subnormal 
----- B. Boy, normal 
—— Cc. Girl, normal 
60 
Constant (7) 
I.Q.'s 
a T A. 93, 94.5, 90 
Sx B. 106, 93 
+ Cc. 146, 132, 111, 
102 
504 
7? M = development at time of first 
/ A menstruation. (Usuall 
occurs at about 70 per cent 
¢ of the adolescent development) 
70 90 110 130 150 190 210 


170 
Months of chronological age 


Fig. 2. Comparison of individual longitudinal curves in height. (From records of 
Hamtramck public schools, Michigan) 


This audience is composed of persons at least interested in 
measurement, or else you would not be here. Ask yourself, “How 
much do I know about the nature and growth of the pupils in my 
classes? In interpreting their scores in tests, how much allowance do 
I make for their levels of maturity, their unique patterns of develop- 
ment?” 
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Height 
n 


es 
—— Mass averages by years, 18,000 
separate measurements 


--- Repeated measurements of an 
70- individual boy, C. M. 


9 10 31 12 13 14 15 16 17 18 
Age in years - 


Fig. 3. Comparison of mass averages with longitudinal data. (From records of 
Hamtramck public schools, Michigan) 
Weight 
in x, Per 
cent 
A. 10-8 Grade 4 \ / 
B. 10-5 Height 54” 
724 x 80- 
‘ 
--0-9 Assumed 
é B. 71 ibs. 
& H $ 
68; uf 404 
/N 
66% <x 20 
@ Identical 
weights B 
64 fe) 
Jan. Feb. Mar. Apr. 3s 50 65 70 
Months Isochrons 


Conventional method 


The two girls were chosen as equal be- 
cause they were the same height and 
nearly the same weight on a particular 
day. 
Fig. 4. 
every week from January to June 


Growth method 


The two girls are seen to be at 
different stages of development 
and growing at different rates. 


Comparison of the weight growth curves of two individual girls, measured 


Although on two different days the girls in Figure 4 had identical 
measurements, they were growing at different rates and to different 
maxima, as is made evident by the graph on the right. 


Children 


| 
/ 
/ 
6 7 8 
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follow a universal pattern of development, but each child has his 
own physiological age scale and his own specific constants of growth. 
Consider two boys, each of whom measures three feet in height 


(Figure 5). 
A 
6 
ft. 
3 
ft. 
The boys are equal in The men are unequal in height 
height. Are they equal but equal in maturity. Both 
in development? are 100% developed in height. 


3 feet is 75% of 4 feet -- but --3 feet is only 50% of 6 feet. 
Does 75% = 50%? 


Fig. 5. Illustration of prevalent fallacies in the interpretation of measurements 


If these two boys were paired for a control experiment, would 
the pairing be adequate? The answer is “no,” and most of our experi- 
mental work is only pseudoscientific because our supposedly equal 
groups are not really equal. 

But, you may say, what is true of measurements of height is 
not necessarily true of educational measurements. Right! But con- 
sider these facts (Figure 6). 


Author's norms Longitudinal curves 
Score Score (Rose, 1926-1929) 


No sign of effect 
of adolescence 


20. 25 - =) 
95 11:5 135 155 175 1065 1.5 225 15.5 14.5 15.5 
Chronological age in years Chronolegical age in years 


ws. norms and Rose longitudinal curves made on Stanford Achievement 
es' 


| 
a 
o 
105 Cases 
A - 106 
B - 003 
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And what about intelligence testing? Do you believe in the 
constancy of the I.Q.? Longitudinal studies have revealed how cycles 
of growth operate to produce apparent inconsistencies in test scores 
(Figure 7). 


As Intelligence Quotients 


I Mental age As Mental Development 

+2 in months 

120, 

104 

100 1lo 120 140 150 100 120 130 140 

Chronological age in months Chronological age in months 
Fig. 7. Comparison of mass and longitudinal measures of intelligence. (From 


Binet Test scores, Baldwin and Stecher) 


The 1.Q. is really only a measure of relative development (mental 
age divided by chronological age) and is not a measure of intelligence 
at all. Intelligence tests, like all other tests, measure performance 
only. Whether the cause of the similarities or differences in two 
children’s scores is “intelligence” or brightness, capacity, etc. can be 
judged better from longitudinal studies of mental growth in a series 
of tests from which it is possible to judge the maximum development 
at maturity and the contribution of various factors to performance. 
Two individuals who may have equal I.Q.’s on a given day may be 
growing differently and to different maxima. Capacity proves to 
be too complex an entity to be measured easily by a single application 
of a single test (Figure 8). 


60 80 100 120 140 160 180 
Chronological age in months 


Fig. 8. Individual patterns of development in mental age curves. (From Ham- 
tramck, Michigan) 
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When 2 child grows more rapidly than the asswmed normal rate, 
his 1.Q. will rise, and vice versa. There is, however, no standard 
definition of intelligence nor any proof that intelligence tests measure 
what one chooses to define as intelligence. The whole field of intelli- 
gence testing is wide open for a new type of analysis and standard- 
ization. 

I have still to prove that scores in tests measure performance 
only, and that performance is heavily loaded with just plain maturity. 

In Detroit we have been giving this year a simple paper and 
pencil test as a measure of the child’s motor performance with a 
pencil. We have purposely excluded as much of mental activity as 
possible, and we have used a battery of three approximately equal 
tests as well as various devices to eliminate some of the chance 
factors known to affect scores (Figure 9). 


Test 1. Draw a circle around the letter like the key letter and put 
a cross on all the other letters. 


Key Marked Key Unmarked 
t x ° aon ss 


Test 2. Drew a circle around the number like the key number and put 
cross on all the other 


Key Marked Key Unmarked 


24 % HR € 17 17 42 19 35 


Test 3. Put a cross in every circle and a circle around every cross. 
Marked Unmarked 
® O x 


Fig. 9. Samples of test directions and items in motor test 


Mean scores in this test range from 18.4 in the kindergarten 
to more than 200 for graduate students when the test is given uni- 
formly throughout the various grade levels. 

For Detroit we have, from a previous testing, similar scores 
from a handwriting test. When each of these sets of scores is ex- 
pressed as the percentage of its maximum at maturity and is plotted, 
the two growth curves almost coincide (Figure 10). We know, how- 
ever, that when the first grade children come to school very few of 
them are able to write. They iearn during the first semester, but we 
have as yet no records of their learning curves. The dotted line in 
the figure shows what probably happens. Handwriting scores improve 
until they reach the limit of motor development. From this point 
on, however, they increase oniy as motor ability increases. Now 
the trend of both the handwriting and the motor scores is from a 
zero score at about two years of age. So learning to write and score 
on a handwriting test are different things only during the first learn- 
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ing period. Precisely similar curves could be shown for reading and 
arithmetic. 


Scores in 
per cents 
109 Handwriting 
90 
60) 
1 2 3 4 6 7 Ww li 12 13 


Age in years 


Fig. 10. Comparison of motor development and handwriting. (Each set of scores 
is expressed and plotted as percentages of its own maximum.) 


It comes to some people as an awful shock that scores in hand- 
writing, reading, arithmetic, and intelligence tests, on the average 
measure closely the same things, since the scores above the primary 
learning steps all tend to coincide when expressed, not in absolute 
units, but in percentages of development. Yet the evidence is quite 
clear. 

When we study individuals, the problem becomes more complicated, 
but it may yet be possible to show that motor scores are closely related 
to reading scores if allowances are made for individual patterns of 
development (Figure 11). 

But, you may say, there is little or no correlation between motor 
scores and either reading scores or semester growths. 

That is right if you accept uncritically conventional statistical 
methods and apply them uncritically. One can correlate two sets of 
any kind of numbers, whether it is proper to correlate them or not. 
In this case it is not proper. Raw scores are not comparable because 
of the effect of the individual growth elements. 

Consider, for instance, the regression equation for volume cor- 
related with length, width, and thickness. A student of mine con- 
structed fifty rectangular blocks of wood (in imagination) out of 
normal distributions of lengths, widths, and thicknesses, combined by 
chance. 

The multiple correlation coefficient for volume in relation to 
length, width, and thickness was R,. LWT=+.9933—a nice high cor- 
relation. The multiple regression equation was V=.5261L+1.2175W+ 
6.7812T — 131.6275, and the standard error of estimate was 6V. LWT= 
1.818. 
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4 
100 
204 
0 5 


10 15 20 
Scores in reading tests 


25 


Fig. 11. Comparison of motor scores with scores in reading and growth in reading. 
(Figures with each arrow are month of chronological age.) 


This equation is, of course, an absurd result. In the first place, 
we know that volume is determined by multiplying length, width, 
and thickness, not by adding them; in the second place, length, width, 
and thickness have equal weight in producing volume; and, in the 
third place, if the blocks are stood on end so that width becomes 
length; thickness, width; and length, thickness, the equation becomes 
V=1.2175L+6.7812W + .5261T — 131.6275; that is, the regression equa- 
tion fits a specific condition only and does not yield a true picture of 
the relationships. 

If we start with the relationship, V = L x W x T, and take the 
logarithm of both sides, the equation becomes Log V = Log L + Log W 
+ Log T. If now, we compute the regression equation for the 
logs of the magnitudes, we obtain Log V = .9261 Log L + 1.0119 
Log W+1.0376 Log T, and the standard error is 6 Log V Log L Log W 
Log T = .02416; that is, by correlation if one uses the various ele- 
ments in their proper relationship, it is possible to obtain true weights. 
But in Figure 11 you were making judgments in terms of noncomparable 
scores and this invalidates your judgments. 

The law of growth is expressed by the formula y = kit", in which 
“y” equals the achieved development at time “t,” “i” is the develop- 
ment at the starting point, and “r’” is the rate of growth.? 

In other words, in spite of the almost universal assumption 
that scores are comparable and relations between factors are additive, 


™Those who wish additional confirmation of this important point are referred 
to “Logarithmic Correlation Coefficients and Regression Equations,” by Robert 
Schrek, in Human Biology 14:95, February, 1942. 
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the facts are that scores must be rendered comparable by converting 
them into maturation units, and the relationships between elements 
are usually multiplicative, necessitating the use of logs. All scores 
in tests are the products of multiplicative relationships between the 
factors involved and most of the correlation studies, and all the elab- 
orate techniques built upon them are pure fancy, for they rest upon 
false assumptions at the outset. Growth distributions are skew and 
not normal, and the relationships between factors are multiplicative 
and not additive. 

Superstitions die hard in education, and any person who attacks 
well-established conventions, as I am doing, can expect only incredulity 
and condemnation. However, an illustration may help make clear 
how vicious and unsatisfactory the results of the conventional use of 
the correlational techniques are. 

A literary college, which shall be nameless, has for many years 
given the Psychological Examination of the American Council on Edu- 
cation, the ACE test. The correlation of scores in this test with 
first semester academic records of the college freshmen is about +.50.* 
For the class that entered in 1931 and was graduated in 1935, ad- 
ditional data are available. The scores of the 866 students in the 
class were sorted into decile groups and for each group the percentage 
graduating, withdrawn, and still in school was computed (Table IV). 
TABLE IV. RELATIONSHIP BETWEEN SCORES ON ACE TEST AND SUCCESS IN 


COLLEGE WORK AS MEASURED BY GRADUATION, WITHDRAWAL, 
OR INCREASED TIME, N=866 


Decile Percentage Percentage Percentage 
| graduated | withdrawing still in college 


| | 


However, these data can be organized in a more significant way. 
Note that 28 per cent are graduated from every decile, 31 per cent are 
withdrawn from every decile, and 15 per cent from every decile are 
still in school. That is, for three students out of four (74 per cent), 
whether they graduate, withdraw, or take more than four years to 
finish has nothing to do with their scholastic aptitude as measured 
by the test (Table V, Figure 12). For the remaining 26 per cent 


54 31 15 
52 33 15 
51 34 | 15 
41 43 16 
40 44 16 
31 53 16 
30 55 15 
28 | 55 19 
28 57 15 

| 


‘Men: Year r Year r Women: Year r Year r 
1928 


1928 53 1934 53 92 51 1934 45 
1929 56 1935 51 1929 56 1935 51 
1930 49 1936 53 1930 54 1936 59 


1931 49 1931 48 
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the correlation is probably not +.50 but very high, the percentages 
falling very exactly along growth curves as indicated in the figure. 


TABLE V. RELATIONSHIP BETWEEN SCORES ON ACE TEST AND SUCCESS 
IN COLLEGE WORK, ANALYZED INTO CONSTANT AND 
VARIABLE PORTIONS 


Constant results | Variable results | _ Diff ig 
Per Per Per Total Per Per | Per Graduated 
cent cent cent per cent cent cent minus 
Deciles grad- | with- (still in cent gradu- with- | still in with- 
uated | drawn | school | constant| ated drawn | school | drawn 
| 
oe geevsdacoun | 28 31 15 74 | 26 | 0 | 0 26 
 . Bateau 28 31 15 74 24 2 0 26 
28 31 15 | 74 | 23 3 | 0 26 
. sakebennawe 28 31 15 74 | 13 12 1 | 25 
| |] 15 ™m =| 12 3 | 1 
| | | 
© . nedessinades 28 31 | 5 | 74 3 22 | 1 25 
74 3 22 | 1 | 25 
| 28 | 31 | 5 | 3 24 o | 26 
 sheniaans 28 | 31 5 | 74 | 0 22 | 4 | 22 


Per 
cent 
> £100 
Z VARIABLE 
PERCENTAGE 
Per GRADUATED \ 
cent 
CONSTANT PERCENTAGE GRADUATED 
74% 
CONSTANT PERCENTAGE IN SCHOOL 
CONSTANT PERCENTAGE WITHDRAWN 
VARIABLE 
PERCENTAGE 
WITHDRAWN, 
| | 
fe) 
DECIL 
1 2 87 


Fig. 12. Graph of data given in Table VII. (Dotted lines represent computed values.) 
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These data seem to me to epitomize all I am trying to say to 
you this morning; they illustrate the whole situation in the measure- 
ment field. Tests are all right; there is very little the matter with 
them. It is the testers that are at fault, and the analytical devices 
they use so uncritically in the interpretation of test scores. Tests 
measure performance only, and it is quite impossible to determine 
from a single test what a score means. We need to remember al- 
ways that, as Thorndike says, “a pupil’s score in a test signifies 
first such and such a particular achievement, and second, only what- 
ever has been demonstrated by actual correlation to be implied in it.” 
Moreover, the vital elements in education—purpose, character, habits 
of work, etc.—we do not even attempt to measure. Yet, as in the illu- 
stration just shown, these factors determine success for three students 
out of four even in the performances we do measure. 

My message, then, is not. that measurement is a failure and the 
situation hopeless. Quite the reverse. We are the failures, and 
whether we are hopeless or not depends wholly upon our vision and 
devotion. We know the criteria under which science operates; we 
must apply them critically, ruthless to our own work. We must learn 
to differentiate between fact and fancy. We must prove every hypo- 
thesis; we must suspect every assumption. Above all, we must not let 
expediency nor the pressure for results force us into the hypocritical 
position of peddling as scientific truths what we know are really 
frauds and lies. In spite of the failure of educational measurement 
to achieve its ideals as yet, I have great hopes for the future. I 
still believe scientific control of human behavior is possible and will 
be one of the crowning achievements in the new order that lies ahead. 


NEXT STEPS IN EDUCATIONAL MEASUREMENTS 


S. A. CourTis, Professor of Education, University of Michigan 


PLANNING for the future is much more difficult than describing 
past events. Any fool can point out existing defects, but the discus- 
sion of plans for improvement always runs the risk of arousing con- 
flicts and misunderstandings. It is very difficult to describe in ob- 
jective terms that which: at present exists only in the imagination. 

In an attempt to reduce misunderstandings to a minimum, let 
me begin by defining terms, so that we may have at least a common 
vocabulary. 

All life as we know it comes from existing life by a biological 
process which to some degree predetermines certain aspects of the 
pattern of development of that new life. I do not want to raise the 
problem of which is greater, nature or nurture. To me the controversy 
seems a foolish one because both contribute to every aspect of life. 
I merely wish to formulate the common experience that, as a result 
of the biological process that brings him into being, a child inherits 
from his father and mother certain potentialities of development to 
which the name “capacity” is given. Capacity is potential ability. A 
little baby by virtue of his human nature has the capacity to learn 
to walk in spite of the fact that at two days of age he does not have 
the ability even to creep (Figure 1). 


| TRAINING 
CONDITIONS 


FE 
Z CAPACITY I E 
x 
| ABILITY - >| PERFORMANCE 
R R 
N N EVEMENT 
CAPACITY A (acer ) 
THE INDIVIDUAL 


Capacity = potential ability 
Ability = power to achieve; capacity developed by training 
Performance * what is achieved under the given conditions 


Fig. 1. Fundamental concepts 


Ability is power to achieve. It is developed from capacity by 
experience, or, as we more often say, by education or training. Ex- 
perience is the better term because the individual is affected by social 
forces long before birth. The home, school, church, state, and unor- 
ganized society predetermine the situation into which he is born. 
Social inheritance is just as potent and deterministic and more ac- 
tively operative throughout life than biological inheritance. However, 
from my point of view, the individual has within himself the power to 
modify the deterministic forces of heredity and environment, so 


(25) 
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that, strictly speaking, both heredity and environment lose their de- 
termining power the moment the individual exerts his creative will. 
Philosophically, you will see I am a mystic, but for present scientific 
purposes I shall proceed as if I were a determinist. Ability, then, is 
capacity developed by training. 

Ability, however, is not visible to the naked eye. What we meas- 
ure are merely the expressions of ability in action. Ability operates 
without exception (as a determinist must say) in the space-matter-time 
framework of our sensory perceptions. What we measure is per- 
formance or achievement: performance clearly in speaking, singing, 
reading, etc. and achievement when the operation of ability leaves a 
tangible material record of its action, as when a picture is painted, 
an essay written, a test paper marked. I personally prefer the one 
term “performance” for all manifestations of ability, because it con- 
stantly reminds me that in both cases I am merely measuring the 
functioning of ability, not ability itself. 

When ability functions it always does so under certain conditions. 
The conditions have two aspects: internal, such as health, motives, etc.; 
and external such as heat, light, ete. My writing with a pen differs 
somewhat from my writing with a vencil. My writing with my left 
hand differs markedly from my writing with my right hand. My letter 
to my wife bears little resemblance in writing or style to my letter to 
my dean. Conditions modify or modulate performance, so that per- 
formance is always a complex resultant of many factors, never a 
pure expression of ability, still less an index of training or capacity. 

The mechanisms involved in performance are complex in them- 
selves, and the ability mechanism is only one of them. A test is an 
external stimulus. The light from the test is received by the eye, 
and the instructions of the examiner fall on the ear (A, Figure 2). 
These organs of reception take in the energy of the stimulus, but 
what happens after that we do not know. It is as if the energy were 
transmitted to a control mechanism, B. This control mechanism ap- 
praises the stimulus in terms of facts, meanings, and values, and 
decides upon the response to be made. It then apparently causes an 
energy-releasing mechanism, C, to release just the amount of energy 
judged appropriate for the situation. Individuals differ in the amount 
of energy they have available for release. 


EXTERNAL 
TEST PERFORMANCE 
STIMULUS 
A ® Organs of reception C = Energy-releasing 
mechanisms 
B = Organs of control: con- 


bilit 


E = motor mechanisms 
Fig. 2. Mechanisms involved in performance 
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Energy and ability are quite distinct. For instance, I have the 
ability to walk, but I can walk rapidly or slowly as the occasion seems 
to demand. The amount of energy released we describe as “effort.” 
Effort is thus one factor affecting performance. 

The next move is as if the control mechanism switched the energy 
to the appropriate ability mechanism, D. Here the energy seems to 
have impressed on it a distinctive modulation. From the ability mech- 
anism the energy apparently is transmitted to the appropriate motor 
mechanism, and the operation of the motor mechanism produces the 
performance which we measure. 

Such a hypothetical description of how ability functions in pro- 
ducing performance is both an aid and a hindrance to understanding: 
an aid, if it enables us to plan experimentation which provides for 
objective analysis of the total situation into elemental phases; a hin- 
drance, if the description leads us to consider our assumptions as ob- 
jective realities. 

It is a fact that some element of our make-up is modified by 
training. What ability is or where it resides is not known. But if 
capacity should consist of a mass of nerve cells which are developed 
and organized into a distinctive connective pattern by training, a 
pattern which modulated thereafter all energy passing through it, 
there would be the possibility of measuring the energy and the de- 
velopment of the ability mechanism separately. 


Here we run afoul of the doctrine of specificity. There are some 
who profess to believe that there are no elements in human behavior, 
and that each performance is a creative resultant which defies analysis. 
I reject the doctrine of specificity in toto. I regard it as a lazy man’s 
alibi for lack of effort. That is what they said in each of the sciences 
before some man more analytical and creative than the rest found a 
way. One thing is sure. In every science the period of rapid progress 
and control is initiated by just such an “atomistic” discovery. I be- 
lieve the basic secret has been “discovered,” and a new era of progress 
impends in all the biologic sciences, including education. 

The concept of ability as a mechanism is a difficult one for some 
people to grasp. Perhaps an analogy will help. Consider an automo- 
bile. It has many parts, of which the engine is but one. The engine 
itself is fabricated from steel. A mass of steel before fabrication has 
the capacity to become an engine, but think of the amount of “training” 
the metal must receive before it reaches maturity. Measuring the 
horsepower of an engine is much like measuring ability, except to 
make the analogy perfect you must keep the hood down and the engine 
out of sight. The driver is the control mechanism. He adjusts switch- 
es, starters, clutches, gears to make the automobile go forward or back, 
as he wishes, but that does not change the horsepower of the engine. 
Of course, if you let out the oil and use gasoline mixed with water, 
the performance of the engine wiil be altered, but not the horsepower 
of the engine. If you can get at the engine to measure the number 
and diameter of the pistons and the length of the stroke, you can 
compute the horsepower of the engine under standard conditions,: but 
if the hood is sealed down, the only way you could estimate the horse- 
power of the engine would be to run the car under a variety of con- 


i 
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ditions, up hill, down hill, with no load, with a heavy load, and so on. 
All that you could measure would be the performance of the car. 

But you will say a child is a personality, not an automobile. 
I will grant that he is more than an automobile in potentialities, but 
he behaves like an engine when you test him. I went into a room in 
an elementary school and gave the children an addition test. In fact 
I gave them four trials of the test, one right after the other. The 
first time I used the conventional instructions. The second time, I 
said, “That was pretty good, but some of you are pretty slow. How 
many think that you can work faster?” Every hand was raised. Then 
I said, “Good. Let’s try the test again, and this time everyone go 
just as fast as he can, like a race. Never mind whether the examples 
are right or not; just see how many you can do in the time allowed.” 
After that test, I said, “You certainly went fast that time, but I’m 
afraid some of your answers are wrong. Let’s try it again, and this 
time be sure every answer is right.” The children worked slowly 
and carefully. Then I said, ““Now let’s try the test again and see if 
you can make the same score as at first.” A week later I went back 
to the room and said, “I have been scoring the test papers you gave 
me last week. You certainly did very well. I’ve come back today to 
see if you can do as well today.” All were certain they could, so I 
said, “Perhaps you think you can make higher scores today.” Again 
every child was sure he could. So I gave them the test. Then I 
produced a bag of nickels and poured them out into a bowl. I saiu, 
“I wiil give a nickel to every child who tries more examples in the 
next test, and another nickel to every child who gets more examples 
right.” That test cost me more than four dollars. 

How does the human engine behave when you change gears— 
excuse me, I mean change testing conditions. Let us look at Figure 3. 

Every child had a different response, but it is possible to general- 
ize. Individual A responded with changes in effort. He expended 
more effort when he went fast; he dropped way down when he went 
slowly. On the fourth trial, he came back to the same score as on 
the first trial. He tried hard on the fifth trial, but the money in- 
centive released more energy than his control could handle. He “went 
to pieces” as we say, and made iower scores than on the first trial. 
Individual B also varied in effort, but he made another kind of change 
also. He varied his relative rate and accuracy without changing his 
effort much. I think you will have to admit, however, that changes 
in conditions affect scores in rate tests, and that they affect children 
differently. 
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Scores 
Trial Conditions Child A Child B 
Tried Right Tried Right 
1 Work rapidly but be sure 
to be right. 28 28 30 30 
2 Work as fast as you can. 29 29 37 27 
3 Be sure every answer is right. 21 21 30 29 
4 As you worked in Trial 1. 28 28 32 32 
Another day 
5 Can you do better today? 33 32 37 35 
6 S¢ if you beat your score. 28 27 42 33 
Individual & Individual B 
Variation in effort Variation in adjustment 
Number Number 
tried tried 
35 - 45 . 
1 1 
40 
4 
25- 35 - 
20 30 
20 25 30 25 30 
Number right Number right 


Fig. 3. Illustration of changes in score with changes in conditions. (Test of ad- 
dition combinations) 


If we use the average score of a large group we can almost com- 
pletely eliminate the effect of change of effort by averaging (Figure 4). 

In several handwriting classes we also tested under a variety 
of conditions. The figure shows the change from the fast to the slow 
test. The teacher gave three tests, under normal, fast, and slow con- 
ditions; then the strange examiner gave two. Then the examiner 
used the money incentive, and got higher scores as you can see. Then 
the teacher put on a great appeal for the children to do as well for 
her out of affection. Note how close the adjustment lines are to 
being 45° lines. That is, the sums of the rates and qualities for the 
fast and slow tests are nearly the same. As one goes up, the other 
goes down. There was no change in ability, merely adjustment of 
performance to conditions. 

I could show you many such results in every academic field. 
The adjustment factor I can allow for, but the effort factor so far 
has defied my efforts to bring it under control. We cannot interpret 


30 BULLETIN OF THE SCHOOL OF EDUCATION 


Quality -- 
Ayres Scale 


7 


T = Teacher 
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Fig. 4. Change of score in handwriting with change of conditions. (N—73) 


any score in any test until we have an objective measure of effort. 
I consider I have proved to you that effort varies in different chil- 
dren and can be varied by changing incentives, but we have no way 
of equating the effort one child puts out with that put out by another 
child. Perhaps some of you will be bright enough to devise a way 
of measuring individual effort objectively. It would certainly be a 
great contribution to progress. 

Let me remind you that science operates under the law of the 
single variable, i.e., a given effect may be ascribed with CERTAINTY 
to a particular factor as cause ONLY when all other factors have 
been held CONSTANT or when variations in them have been allowed 
for. So long as several factors contribute to performance, we can 
not tell how much each one contributes until we can measure them 
separately. 

The most we can do at present is to explore an individual’s 
field of variation and locate its center (Figure 5). 

The field always extends down to zero because an _ individual 
may, if he wishes, lay his pencil down and say, “I won’t take your old 
test.” But what the exact shape of the field is, no one can tell. The 
individual in the figure took 22 different tests in handwriting. Have 
you ever seen so many for one individual? His rate ranges from below 20 
up to 90 and his quality from 40 to 90. His ability at the time of 
these tests was probably about 125 (70 + 55), but I would not be 
too sure. Note that this child did better in response to the teacher’s 
appeal than he did for the money incentive. If your practice is tc 
estimate children’s ability on the basis of a single test, I recommend 
that you do a little exploring by giving many trials under a variety 
of conditions, so that you will not be too complacent about your practice. 
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Field of variation for one Comparison of two individuals 
individual 
Twenty tests -- handwriting 


(Theoretical) 


- 
of 


Zo Bo 100 


Fig. 5. Comparable scores 


Consider the figure on the right which is supposed to represent 
two children’s fields of variation. You will note they overlap. Sup- 
pose now on a single test B exerts great effort and makes the score B, 
while A does not half try and makes the score A. On the basis of a 
single test, you are in danger of saying that B is better than A. But 
if by many tests you mapped out the whole field of variation for each 
child and located his centers Ac and Be you would be certain A was 
better than B. 

Another factor affecting performance is the material we use 
in our tests. What do you mean by ability: ability to deal with 
material the child has never seen before, or ability to achieve after 
practice? One is general, the other specific. How do you tell in any 
test which factor is operating? 

Thorndike has given us the concepts of altitude and range. Con- 
sider spelling, for example. As we move up from “at” to “catastro- 
phe” we reach harder and harder words; more complex words, I prefer 
to say, because difficulty is itself an ambiguous concept. Thorndike 
places “at” among the first thousand important words, while ‘“catastro- 
phe” occurs in the tenth thousand. But similarly “above” occurs in the 
first thousand and “evade” in the tenth thousand. Yet intrinsically, 
“above,” “awake,” “abode,” “evade” are equally easy to spell. They 
are all built on the same pattern. Most children, however, have ex- 
periences with “above” before they do with “evade.” Shall we call 
one harder than the other? And what about “ear” and “era”? Are 
these words 10,000 words apart in difficulty? Or would it be better 
to analyze more closely and test for separate elements? I’m all for 
exact analysis. 

You may reply, “Such matters are all very important for rate 
tests: that’s why I never use them. Such considerations don’t apply 
to power scales.” 

Well let us examine a power test in spelling made by choosing 
two words from each column of the Ayres Scale, 52 words in all. 
Examples of the words chosen from the first part of the scale are 
“do” and “me,” “go” and “on,” and “can” and “run.” These increase 
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in difficulty to “brother” and “coming,” “country” and “morning,” 
“pretty” and “recover,” and toward the end of the scale include such 
words as “immediate” and “committee,” “decision” and “principle,” 
and “judgment” and “recommend.” 

Such a test is typical of all power tests and batteries of tests 
like the Binet and others. How do children respond to tests of this 
type? Let us not just consider mass results, but analyze individual 
responses (Figure 6). 
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Fig. 6. Illustration of variation in performance of children in power tests. (Ayres’ 

Spelling Scale) 

I have selected the poorest three spellers, the ten average spellers, 
and the best two in the class for study. Each long rectangle represents 
the 52 words in the test as spelled by a child. The white parts are 
words correctly spelled. The single line represents the point where 
the first word was missed, the double line the point where the last 
one was spelled correctly. 

Note that the rectangle for each child has at least two different 
regions and many of them have three. I have called these the region 
of perfect control, the region of variable control, and the region of no 
control. The words a child can spell and the words he cannot spell are 
no tests for him. What we seek to locate is the center of the region 
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of variable control. But how differently the children respond. The 
second child from the top has no region of variable control, apparently, 
while the third child has a very long one. Why? I do not know. I 
suspect the answer is partly tied up with the ideas of effort and 
adjustment, and the ideas of altitude and range previously discussed. 
Of one thing I am certain. A single trial of the test is a very poor 
basis to judge from. The scale needs to be restandardized in terms 
of the training necessary to achieve success; in terms of growth in- 
stead of status. Then we could tell whether altitude or range or both 
were involved. Also, if I were measuring, I would use a 10-word test 
to locate the approximate level of the child’s development. Then I 
would give him four or more tests, each of 40 or 50 words, at just 
above and below his indicated level. Also I should try these tests 
under a variety of conditions. Then I might be in a position to really 
estimate his true ability in spelling. 

If I really wanted to know about the meaning of the child’s 
performance, I would have to measure him thus three or four times 
a year, plot his growth curve and estimate meaning by reference of 
scores to his own curve. 

It is most remarkable that biologists and educators have not 
defined what they mean by growth. I have for myself. Let me share 
some very helpful concepts with you (Figure 7). 
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Fig. 7. Definition of growth 


By growth I mean the progress toward a specific objectively 
defined maturity made by an immature organism acted on by en- 
vironmental influences under constant conditions. Growth always 
takes place in the space, matter, energy, time framework. A child 
in the slums does not grow as he would if he were transplanted to 
a home of wealth. Every change in conditions affects growth, but 
what many biologists and educators have not seen is that there are 
two problems here, not one. The immature organism is always grow- 
ing. If we keep conditions constant, it grows; if we alter the condi- 
tions, its rate of growth changes but it still grows. 

Therefore I have divided my analysis of growth into two parts. 
I found out by experimentation: (1) how growth proceeds under con- 
stant conditions, and (2) what happens when modifying influences act. 
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Growth within a given framework always proceeds according 
to a universal law. I know why, but it is a long story and I do not 
have time to tell it now. Growth under constant conditions yields a 
skew curve. That is, 50 per cent of the development occurs during 
the first third of the time; the last half of the curve is longer than 
the first, as an Irishman would put it. Modifying influences have 
three points of attack: (1) They may modify the organism, as when 
a boy loses his sight. (2) They may modify the environmental in- 
fluences acting, as when a child who has been taking one lesson a 
week in music doubles his training and takes two lessons a week. (3) 
They may modify the conditions under which growth takes place, as 
when the weather changes from hot to cold, wet to dry, or vice versa. 

I know it is very hard to believe that plants, animals, human 
beings, and social institutions follow identical patterns of growth 
when conditions are constant, but they do (Figure 8). The way a 
child cuts his teeth can be described by the same formula as the way 
he learns to spell, or develops ideals of integrity. The literature is 
full of curves which do not follow the universal pattern, it is true, 
but that is because the conditions have not been kept constant. The 
true scientific statement is that “to the degree conditions during 
growth are kept constant, to that degree growth tends to approach a 
universal pattern.” 
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Fig. 8. Illustrations of simple growth curves 
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For 16 years now I have been collecting growth curves, and the 
statement I have just made summarizes my experience. Until you 
have a chance to prove it true or false with your own data, please 
accept it for the purposes of our present discussion. No one will see 
you if you also keep your fingers crossed. 

The universal growth curve throws light on lots of problems 
ana explains many difficulties we measurement men have encountered. 
Let us study it a little (Figure 9). 
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Skew curve. Unequal percentages of growth made in equal periods of maturation 
period. 75% (passing mark) represents only 42.46% of total maturation. 


Fig. 9. Characteristics of a single cycle growth curve. (Thorndike, in Measure- 
ment of Intelligence, page 270, states that the assumption of normality of dis- 
tribution ‘“‘was about as safe as any other one assumption and much easier to 
operate with. Hence we gradually slid into the habit of using the doctrine. This 
fashion became so strong that in recent years psychologists have assumed sym- 


metry even though units taken at their face value produced a markedly skew 
distribution.” ) 


The curve is a skew curve, concave at first and convex as it 
rounds off to a maximum. Curves differ greatly in dimensions, but 
it is always possible to generalize them by dividing the growth meas- 
urements by the maximum at maturity. This transforms them into 
percentages of development. Also, it is always possible to divide the 
measurements of time, in days or years, etc., by the total period of 
maturation, thus changing them also into percentages, or isochrons 
as we call them. An isochron is defined as one per cent of the total 
period of maturation. For equal changes in the time percentages, the 
changes in the development percentages are very unequal. Thus A, 
B, and C represent unequal developmental changes for equal time 
changes. 

At first sight it might seem impossible to describe such a twisty 
curve mathematically, but it turns out that a simple equation con- 
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sisting of three constants and one variable describes it exactly. The 
only variable element is time. The constants are the maximum toward 
which growth proceeds: “k,” the degree of development at the time 
chosen as the starting point (growth never starts from nothing; the 
organism that grows always has some size); and the rate at which 
the organism grows. 

The formula is an exponential equation and most teachers do 
not know how to deal with such equations. But cheer up. Here 
is a card which gives for every percentage and tenth of development 
the corresponding value on the isochronic axis, so it is easy to trans- 
ter from one axis to the other. The use of isochrons transforms the 
exponential equation into a simple linear equation, the kind you had 
in algebra, and thus makes computations easy. In the _ isochronic 
system, equal growths are made in equal intervals of time. That is, 
if a child grows seven isochrons on Monday, he will grow seven more 
on Tuesday, and so on, so long as the conditions remain constant. 
And if conditions do not remain constant, the change in the isochronic 
rate of growth is a measure of the modifying influence that brought 
about the change. 

Perhaps you are disturbed by that skew curve. Skew curves 
come from skew distributions and skew distributions do not have 
much standing in conventional statistical circles. Sorry, but all 
growth distributions are skew. If you believe in facts, read what 
Thorndike says about why we assume normality, then look at some 
real distributions (Figure 10). 

In 1931 I had an opportunity to measure school children abroad. 
The table shows the growth in dentition of girls. The ages are given 
at the top and the number of teeth at the left. The insert is the 
normal pattern for dentition in an individual. It is a two-cycle curve. 
During childhood the normal child cuts 14 teeth and during adolescence 
he cuts 14 more. Notice the equation under the graph. It is simply 
two growth curves added. This equation describes the development 
of dentition in this girl so precisely that the average difference be- 
tween the mathematically computed teeth she had at the different 
ages and the number she actually had is two hundredths of a tooth. 
Now look at the distributions for the various ages (Figure 10), and 
see how, as individuals approach a maximum point, the frequencies 
pile up and the distributions become skew. This is typical of all 
growth data. 

Well, what of it, some of you are sure to be thinking. What 
practical use can a teacher or a measurement man make of such 
information? 

You are interested in the measurement of teacher efficiency, 
aren’t you? Well, teaching is a modifying influence and is measured 
by the changes it produces. BUT you must allow for the growth 
which takes place anyway. 

Consider the growth of performance in spelling the word “sin- 
cerely,” which in this city occurs in the word list for the 8B grade 
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Fig. 10. International comparisons in child development—total number of permanent 
teeth cut by 7,288 boys from Italy, Switzerland, England, and Scotland 


(Figure 11). Perhaps you had better consider also the growth of 
performance in spelling the word “customary,” which does not occur 
in the course of study at all. 
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Fig. 11. Teaching as a modifying influence in spelling 
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The curve in the figure is a curve computed from the formula. 
The actual measurements are indicated by circles. The largest error 
at any point is 7 per cent and the average error +1.8 per cent. I 
find many teachers who appear shocked to discover that children 
learn to spell words even when they are not taught in school. The 
really important matter, however, is the question, “When 40 per cent 
of the children have had sufficient opportunity to learn, so that they 
can spell the word successfully, why haven’t the other 60 per cent 
learned to spell it correctly too?” The answer is that teaching is 
not the only factor determining whether children learn to spell. A 
child simply cannot learn to spell until he is mature enough or until 
he has had practice enough for his peculiar nature. Children are 
unique. No two children are alike. They learn at different rates 
and in different ways. You cannot make a child learn until he is 
physiologically ready to learn any more than you can ripen a green 
apple before it is mature. , 

Teachers wear themselves out trying, I admit, but what good 
does it do? Let us look now at the curve for “sincerely.” It fol- 
lowed along nicely until the seventh grade, then the rate changed. 
The curve was progressing at the rate of 5.3 isochrons, and then 
it jumped to a rate of 12.8 isochrons. Did the teachers make all 
that change? No, the children would have grown 5.3 isochrons without 
teaching. So the teachers increased the rate about one and a half 
times. But see what happened in the next year. Back the curve 
tumbles to the position it would have had if there had been no teach- 
ing. So what is the use? Wouldn’t it pay to determine each child’s 
natural rate of growth and just do the things necessary to keep him 
growing, as we do with plants and animals? That was a great idea 
of Froebel’s, a kindergarten. Would that all our schools were places 
where teachers were helping children grow and giving the same dif- 
ferential attention to children that a gardener gives to sunflowers and 
violets. You do not expect them to grow alike. Well, what about 
the future doctor, the lawyer, the engineer, the business man, the 
artist? Do you expect them all to grow alike? Or does your school 
system respect individual differences? 

But you will alibi, “How can I know about the differences in 
children if my school doesn’t keep cumulative individual records?” 
That’s right—you can’t possibly. It would be a great step in ad- 
vance if you kept cumulative records, though. And you could if you 
really wanted to. 

Let us go back to teacher measurement. Don’t you see how 
unfair it is to measure teachers by the scores her children make? 
You need growth records before a superintendent can be fair to you, 
and you need growth records before you can be fair to the children. 

There is more to this child development business than one would 
think. Consider individual curves of growth, for instance (Figure 12). 

I am using curves of growth in height and in reading for com- 
parison. A is a tall girl, and B a short one. At maturity one is 
more than 11 inches taller than the other. Their curves differ in 
other ways, yet both are two-cycle curves and both conform to the 
universal pattern. What shall we say about C and D in reading 
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Fig. 12. Individual growth curves 


scores? Is C more able than D? Would you say, about A and B, 
that A is more healthy than B just because she is taller? The 
two sets of curves are very much alike. Both are two-cycle curves. 
The individuals in both sets differ widely. Is it not likely that a 
teacher would fail as completely in raising D’s reading scores to equal 
C’s as she would in raising B’s height to A’s? Scores in reading tests 
measure developmental patterns of reading performance more than 
they measure reading ability. 

Note the break in C’s curve at about 135 months. What happened 
to her I do not know. I could have found out if I had been her 
teacher and had had these curves as a basis of interpretation. But 
if I were a surveyor coming just that day and giving a single test, 
how wrong I would have been in my appraisal of C’s ability. 

What do you think when you look at an age-grade table (Figure 
13, upper half)? The ten-year-olds range from 2B to 6B grades, 
while the 5B’s range from 9 years of age to 14. There are 2,637 
children represented in the table. Shall we get an average score for 
all? “That’s silly,” you reply. “Some children have had more training 
than others, and some are older than others. Some are bright and 
some are dull.” Well, how much allowance do you make for such 
factors? The lower half of the table gives the average scores made 
by each age-grade group. Notice how they vary. In some grades 
the youngest and brightest children make the best scores, but the 
old and extremely dull in the same grade make nearly the same 
scores. What a mess! 
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Numbers of boys by ages and grades 
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Fig. 13. Effect of factors on scores of 2,637 boys in grades 1A to 7B made in 
spelling the word “ARMY” 


Suppose we try to bring order out of chaos by finding a differ- 
ence between two cells which differ in a single variable. There are 
none. The circles along the longer diagonal line, for instance, repre- 
sent the scores of children in the B grades who are approximately 
equal in intelligence, but one year older from circle to circle and with 
one year additional training. The jump from any cell to the next 
invariably involves two elements. Only the scores along the diagonal 
line form a good growth curve. All the others represent various dis- 
tortions caused by shift in factors. In the last analysis, the only 
safe basis is the individual. But the table makes clear the extent 
to which the various factors affect performance. Without analysis, 
how can you tell what a score represents? 

In making comparisons between teachers or school systems, I 
suggest it would be a step in advance to compare the scores of equal 
age-grade groups and discard all mass averages. 

Few persons realize the extent to which children differ. Mass 
averages of boys’ scores by grades differ from those of girls’ scores 
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(Figure 14). Children in B classes differ from those in A classes. 
In other words, performance scores reflect sex and season of birth 
as well as intelligence, training, capacity, and a host of other factors. 
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Fig. 14. Effect of sex and season of birth. (Average age of children in  kinder- 


garten is 60 months; of those in grade 1, 72 months; grade 2, 84 months; 
grade 4, 108 months; grade 6, 132 months; and grade 8, 156 months.) 


I want to make brief suggestions on two more important points 
before closing. 

The first is testing. Teachers who consider the facts I am 
presenting tend to ask, “Do you mean that it isn’t possible to measure 
ability in reading?” 

There is a way by differential testing but we must have a meas- 
ure of effort before it can be made to work satisfactorily. 

Reading ability can be measured, I believe, by giving two tests 
just alike except that one involves one more mental element than the 
other. Individual A makes a score, let us say, of 48 in Part 1, which 
is almost whelly mere word recognition, and a score of only 24 in 
Part 2, which involves thinking of the relation of one word to an- 
other. One score is 50 per cent of the other. As A matures, his 
scores in Part 1 and Part 2 would increase, let us suppose, to 60 and 30, 
but the relationship would still be 50 per cent. If, however, A learned 
to read and think better, the score in Part 2 would increase faster 
than the score in Part 1, and the percentage would increase corre- 
spondingly. That is, by using two tests and the ratio, all common 
factors, age, sex, intelligence, maturity level, etc., would cancel out 
and the ratio might possibly be a pure measure of ability. I do not 
know. ‘The lead is worth following. It gives a totally different pic- 
ture from that given by conventional scores. They cannot both be right. 
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So far we have dealt only with academic tests of knowledge, 
skill, and power. There are other important matters to be considered, 
matters commonly ignored: temperament, aptitudes, character, etc. 
These also influence scores and must be reckoned with. How, we do 
not at present know, but it is easy to secure evidence of the facts. 

In Hamtramck, Michigan, they tried for a while a system of 
marking which differentiated effort and achievement. Achievement is 
an objective fact and should be described in objective terms without 
merit words attached, just as we report the fact that a child is three 
feet tall. Effort, however, involves a moral element. How hard a 
child tries influences how much he grows. 

The Effort Scale had ten divisions in it. They were called cre- 
ative success, efficient improvement, routine achievement, inefficient 
trying, wasteful trying, awareness of opportunity without action, in- 
difference, evasion, pretense, and open rebellion. These represent 
stages from good to bad effort. . 

The correlation of effort to achievement in English was r = .24, 
which is negligible (Figure 15). The reason is not hard to find. An 
immature boy who means well and tries hard cannot possibly make as 
high an achievement score as a mature well-developed boy who is only 
half trying. The table furnishes food for much thought. If teachers 
and administrators can ever be pried loose from regarding scores as 
measures of ability, perhaps we in education can attack the really 
important problems of developing character and ideals. 


TABLE I. RELATION OF EFFORT TO ACHIEVEMENT IN JUNIOR HIGH SCHOOL 
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In conclusion, let me summarize briefly the recommendations 


I have made for the next steps of progress to be taken and suggest 
some reference reading in case any of you wish to follow these mat- 
ters farther. These recommended next steps in educational measure- 
ments are as follows: 


i. 
2. 


3. 


4. 


5. 


10. 


Make much more objective, impersonal, critical appraisal. 

Keep as many cumulative records of each individual as possible: 
physical, mental, emotional, educational, social. 

For the present translate scores in tests into “ages” based on 
mean scores of children in-grade-at-age. 

Ultimately translate scores into maturation units based on each 
individual’s own growth curve. 

Invent an objective measure of effort. 

In control experiments, prove groups equal in achievement and 
rates of growth before applying the experimental factor. 

Use the differential technique to measure factors. 

Compare school systems by achievements of comparable age-grade 
groups. 

Explore an individual’s field of variation by many trials of 
a test under a variety of conditions to determine his central 
achievement, 

Remember always that no single test measures anything but 
performance-under-given-conditions. 
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A Study of the Achievement of Freshman Students 
Holding County Scholarships 


MERRILL T. EATON 


THE chief purpose of this study is to determine (1) whether the 
Indiana University Scholarship Committee selects superior students 
for county scholarships, and (2) whether the students selected for 
county scholarships achieve a higher scholastic standard during the 


first semester of their freshman year than do those not holding such 
scholarships. 


Subjects 


For comparative purposes three groups of students enrolled in 
Indiana University for the semester under consideration were se- 
lected. The first of these groups was made up of 103 beginning 
freshmen holding county scholarships. Not all of these subjects had 
been awarded their county scholarships on the basis of academic 
proficiency, however, as some had been selected because of marked 
superior ability in a special field, such as music, dramatics, or ath- 
letics. Therefore, a few of the students in this group had not made 
a superior high school record nor did they achieve highly on the stand- 
ardized tests given them as subjects in this study. 

The second group was made up of 135 beginning freshmen who 
had applied for county scholarships but who had not been awarded 
them. These students were sufficiently interested in attending the 
University to enroll without the aid of county scholarships. 

The third group included 117 beginning freshmen selected at 
random but not including students in either of the other two groups. 
This group was made up by first taking the tenth name in, the student 
directory. If this name was the name of a freshman enrolled in 
his first semester in the University who had neither received nor 
applied for a county scholarship, this student was included in the 
random selection. If he did not meet these qualifications, then his 
name was replaced by the name of the first person listed below it 
who did meet the requirements. After a name was selected, another 
ten were counted, and the same procedure was repeated. 

These three groups are referred to in this study as the scholar- 
ship group, the non-scholarship group, and the random selection group. 
For the sake of saving space initials are often used to indicate the 
groups, those used being SC, NS, and RS, respectively. 


Data Used 


The data used in this study were compiled from three different 
sources: (1) the high school records of the subjects; (2) the grades 
earned by the subjects during their first semester at Indiana Uni- 


versity; and (3) the grades made by the subjects on certain standard- 
ized tests. 


(44) 


| 


ACHIEVEMENT OF COUNTY SCHOLARSHIP STUDENTS 45 


Grades made in high school courses were secured from transcripts 
in the office of the Director of Admissions of the University. Informa- 
tion was also secured from this office concerning the application for 
and awarding of county scholarships. Grades made by the subjects 
during the first semester of their freshman year at Indiana Univer- 
sity were taken from record cards filed in the Registrar’s office. 

Three standardized tests were given to the subjects: (1) the 
American Council on Education Psychological Examination, referred 
to hereafter in this study as the ACE; (2) the American Council 
on Education Cooperative English Test; and (3) the Iowa Silent 
Reading Test. The scores made by the subjects on these tests were 
obtained from the University’s Testing Bureau. 


Results 


The results of this study are presented in table form, so that 
comparisons of the three groups may be easily and directly made. 
In Table I the students are divided according to centile rank in the 
high school graduating class. 


TABLE I. NUMBER AND PER CENT OF STUDENTS IN EACH GROUP, DIS- 
TRIBUTED ACCORDING TO CENTILE RANK IN THEIR 
HIGH SCHOOL GRADUATING CLASS 


Centile rank in | Students 
high school ! Group 

graduating class | Number Per cent 

| sc 68 66 

RS 8 vi 

| sc 17 17 

NS 31 23 

| RS 12 10 

| SC 8 8 

RS 11 9 

sc 10 10 

NS $1 23 

RS 86 74 

sc 103 100 

NS 134 100 

| RS 117 100 


According to Table I, 66 per cent of the scholarship students 
and 35 per cent of the non-scholarship students came from the upper 
5 per cent of their graduating class, while as many as 74 per cent 
of the students selected at random came from the lowest 85 per cent 
of their class. Only 7 per cent of the RS group came from the highest 
5 per cent and only 10 per cent of the SC group came from the lowest 
85 per cent. On the whole, therefore, it seems evident that a larger 
proportion of the scholarship students than of either of the other groups 
made excellent scholastic records in high school. This is not sur- 
prising, however, for, as has already been explained, previous academic 
achievement is one of the standards used in selecting students to be 
awarded county scholarships. 

Table II gives the number of students in each group, divided 
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according to their high school records, the total number of semester 
hours of work carried by each group during the beginning semester 
of their freshman year at Indiana University (an average of about 
15 semester hours carried by each student), the number and per cent 
of hours in which they received smoke-ups (the University’s mid- 
semester warning to failing students), the number and per cent of 
hours in which they received each letter grade, and the average num- 
ber of credit points! received. 

The division of the subjects into “superior,” “average,” and “weak” 
groups was determined on the basis of grades made in high school. 
If A’s and B’s predominated on the high school record card, the stu- 
dent was considered superior, if B’s and C’s predominated he was 
classed as average, and if C’s and D’s were most prevalent he was 
considered weak. According to.Table II, all of the students awarded 
county scholarships were in either the superior or the gverage class, 
while as many as 11 per cent of the RS group were classed as weak. 

Only 2 to 4 per cent of the students receiving smoke-ups in each 
group were classed as superior students, while 51 per cent of the 
RS group who were included in the “weak” classification received 
smoke-ups. In the scholarship group, 475 of the 507 hours of A and 
538 of the 638 hours of B were earned by “superior” students. 

The “superior” students also received a higher average number 
of credit points than the others, and the scholarship group received 
a higher number than either of the other two groups except among 
the “weak” students, which included no scholarship students. 

This table indicates rather clearly that county scholarships were 
awarded to students whose academic work was above average both in 
high school and in the first semester of college. 

In Table III the data are given concerning the college success 
of the three groups when distributed according to quartile ranking 
on the ACE. Included in this table are the total number of hours 
carried during the first semester of the freshman year, the number 
and per cent of hours in which smoke-ups were received, the number 
and per cent of hours in which each letter grade was received, and 
the average number of credit points received. 

According to Table III, more than three fourths of the scholarship 
group and about 40 per cent of the non-scholarship group fell in the 
highest quartile on the ACE. There was a higher percentage of non- 
scholarship students than of those in the other two groups in the 
second quartile, and a higher percentage of students in both of the 
quartiles below the median, there being several times as many RS 
students in the lowest quartile as in either of the other two groups. 

In each quartile the students holding county scholarships received 
fewer hours of smoke-ups, fewer hours of low grades, more hours 
of high grades, and more credit points than did those in either of the 
other groups. The total percentages show rather conclusively that 
the scholarship group was superior to the other two groups and that 
the random selection group made the poorest grades of any of the 
three groups. Only 3 per cent of the scholarship group received 
smoke-ups, while 12 per cent of the non-scholarship group received 


1At Indiana University three credit points are given for each semester hour of 
A, two for each hour of B, one for each hour of C, and none for each hour of D, 
while one credit point is deducted for each semester hour in which an F is received. 
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them, and as many as 27 per cent of the RS group received them. 
A much smaller percentage of students actually received no credit 
for part of their work, however. Of the 1,548 semester hours of work 
taken by the scholarship group, 507, or 33 per cent, received a grade 
of A and 638, or 41 per cent, received a grade of B: Thus approximately 
three fourths of the grades received by this group were above the 
University’s average grade of C, while only 3 per cent were below 
the average. The group who had applied for county scholarships 
but had not received them was somewhat above average but fell con- 
siderably below the SC group, as only 16 per cent of the grades were 
A while as many as 12 per cent were below C. 

Of the 507 semester hours of A made by the scholarship group, 
450 were made by students who fell in the upper quartile on the 
ACE, 52 were made by students who fell in the second quartile, 5 
by those in the third quartile, and none by the one student who 
was in the lowest quartile. On the other hand, in the SC group 
the 5 hours of work for which no credit was received were taken by 
students in the highest quartile. In the light of other data in the 
table it seems probable that this work was incomplete or that credit 
was deferred for some other reason than that the student failed. 

The differences in the average number of credit points earned 
by each of the three groups are approximately equal. The scholar- 
ship group earned an average of 2.033 credit points, the non-scholarship 
group an average of 1.512 credit points, and the random selection 
group an average of 1.062 credit points. By far the highest average 
number of credit points was received by those in the scholarship group 
who fell in the highest quartile on the ACE. For the most part, in 
each quartile the three groups fell in the same order in average num- 
ber of credit points earned as they did in the total group. Only two 
exceptions to this occurred. The NS students who fell in the 51 to 75 
quartile on the ACE made a lower average than the RS group, and 
the SC students in the 26 to 50 quartile made a slightly lower average 
than the NS group. This latter case might be explained by the fact 
that so few of the scholarship students (only five) fell in this quartile, 
and they were probably among those who were awarded county scholar- 
ships on some basis other than academic achievement. 

On the whole, the consistency with which the SC group surpassed 
the NS group and the NS group surpassed the RS group in high 
grades received indicates that scholarships were awarded to students 
who were superior in their scholastic achievement during their first 
semester at the University. 

The question sometimes arises as to whether there are any no- 
ticeable differences in ability among the students enrolled in different 
schools of the University. Table IV has been compiled for the pur- 
pose of showing the average number of credit points, and its centile 
ranking, made by each group when divided according to school in 
which enrolled and quartile ranks on the ACE. 

The data in Table IV indicate that, regardless of the school in 
which enrolled, the students holding county scholarships earned a 
higher average number of credit points than those who applied for 
but did not receive scholarships, and that this latter group was 
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superior in this respect to the group of students selected at random. 
There are slight variations in this relationship, but such differences 
are probably due to the fact that there were far more students in 
some of the groups than in others. In every group the largest per 
cent of the students were enrolled in the College of Arts and Sciences, 
while only nine students, three in each group, were enrolled in the 
School of Music. The small number enrolled in the School of Education 
and the School of Music, as well as in some of the group and quartile 
divisions of the College of Arts and Sciences and the School of Busi- 
ness, would tend to cast some doubt on the validity of the data con- 
cerning the students in these groups. This is true even in the totals, 
and therefore comparisons between different schools are not reliable. 
On the whole, the data for the various schools are very similar, the 
widest variance being in the School of Education, where the scholar- 
ship group received a considerably higher average number of credit 
points and those in the random selection group a lower number of 
points than did those groups in other schools. In general, the figures 
in this table indicate that the scholarship group in each school was 
superior to the other two groups in average number of credit points 
earned. 

Table V is similar to Table III except that in it the groups are 
divided according to schools in which enrolled instead of according 
to quartile rank on the ACE. 

Again in Table V the scholarship group shows consistent superior- 
ity over the other two groups and the non-scholarship group ranks 
higher in achievement than the random selection group. Four per 
cent of the SC group in the College of Arts and Sciences received 
smoke-ups as compared with 27 per cent of the RS group; and none 
of the SC group in the School of Education and the School of Music 
groups received smoke-ups while 33 per cent and 31 per cent of the 
RS group in the two schools, respectively, received these warnings. 
In every school the scholarship group made more A’s than either 
of the other groups and the random selection group made more C’s, 
D’s, and F’s. And in every school the average number of credit points 
earned is highest for the scholarship group, somewhat lower for the 
non-scholarship group, and lowest for the random selection group. 
It seems evident, therefore, that the students awarded county scholar- 
ships do superior work in the University regardless of the school in 
which they are enrolled. 

In Table VI are given the median percentiles made by each of 
the three groups of subjects on each part of the American Council 
on Education Psychological Examination, the American Council on 
Education Cooperative English Test, and the Iowa Silent Reading 
Test. All of these tests were given to the subjects upon entering 
Indiana University. 

The data in Table VI show that on every part of every test the 
scholarship group made much higher scores than the non-scholarship 
group and the non-scholarship group in turn made much higher 
scores than the random selection group. The average difference 
in median centile made by the three groups on all parts of the 
test is about 22 centiles. In only one case was the difference 


{ 
} 


BULLETIN OF THE SCHOOL OF EDUCATION 


YALLAT HOVA AO GNV SdN-aAMOWS AO SHUNOH AO INGO GNV SHNOH AO AAAWOAN TVLOL 


Lp 6 |est | st | soz | a zez | sxe |6 | | | =| | sa 

| : 09 |6 | 9 | | | o1n | oF | | | vet | SN 

|u| 2 laze |ee eso _|ee | 10s le og | | os 

| 0 0 o lo | [es ler 98 | fo 18 9¢ 

828" 9 m | | op =| 88 o | SI su 

0 0 o 6s gs | 0 0 6E1 os 

I 9 | » | at {9s | soz | 22 | ot | | | 66 

0 o |1 og | eit | iz | 9 os 

| ¢ st | SIP 0% 861 or oor =| Lz | 622 OL Su 

ve | |or | 9 | 89 | 88 rep | est (sort SN pue 

v 
qaquinyy 
| Youe jo | 
Ga T1IOUNT HOIHM NI FDATION ALVAGVADUAGNA OL ONIGHOOOV GILAGIUL 
AHL 40 HOVE AM AGVW SINIOd LIGAHD AAMWAN ADVUSAV GNV 
‘A 


| 52 
| 
| 
| 
| 
| 
| 

> 

. 


TABLE VI. 


ACHIEVEMENT OF COUNTY SCHOLARSHIP STUDENTS 53 


OF THE PARTS OF THE ACE, COOPERATIVE ENGLISH TEST, AND 
IOWA SILENT READING TEST 


THE MEDIAN CENTILE MADE BY EACH OF THE THREE GROUPS ON EACH 


Median centile 


Number | Iowa 


Group of ACE Cooperative English Test Silent 
students | Quanti- Linguis-| i Spell- Vocab- Reading 
tative tic Total Usage ing ulary Total Test 
| | | | 
SP cies aces | 103 83.7 87.6 88.4 78.9 ] 81.7 86.9 | 85.8 | 86.5 
| | | 
| | 
MP “stendass | 134 58.5 71.5 67.0 | 55.8 56.8 49.5 | 69.7 63.3 
| 
| | | | 
Me. sxreonee 117 37.3 41.8 36.5 | 34.2 | 48.9 36.2 35.4 41.4 
| | 


less than 10 centiles; this was between the non-scholarship and 
random selection groups on the spelling section of the Cooperative 
English Test, where the difference was only 7.9. On the other hand, 
there was a difference of 37.4 centiles between the scores of the 
scholarship and non-scholarship groups on the vocabulary section of 
the same test. These median centiles seem to give convincing evidence 
that, so far as ability to make high scores on these tests is concerned, 
the scholarship group is superior to either the non-scholarship or 
random selection groups. 


Summary 


The results of this study are unusually consistent in showing 
that, of the subjects used, the scholarship group was superior to 
the non-scholarship group and the non-scholarship group was superior 
to the random selection group. The principal conclusions formed 
from the study are: 

1. The majority of the subjects awarded county scholarships 
came from the upper 5 per cent of their high school graduating 
class. Only 10 students came from the lower 85 per cent and it is 
probable that some of these were among those awarded scholarships 
on the basis of some special ability rather than of academic standing. 
As many as 83 per cent of the scholarship group fell in the upper 
10 per cent of their high school graduating class, while 58 per cent 
of the non-scholarship group and only 17 per cent of the random 
selection group had done this well in high school. 

2. Subjects who had done superior work in high school received 
fewer smoke-ups and low grades and more high grades and credit 
points during their first semester at Indiana University than did 
those who had done either average or poor work in high school. 
This would indicate that ability to succeed in school work carries 
over from high school to college. High school grades are therefore 
a good measure to use in selecting scholarship students. 

3. When the subjects were divided according to their quartile 
rank on the ACE test, the three groups again fell in the same order, 
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with the scholarship group in the lead and the random selection group 
below the others. 

4. Within each school the groups again fell in the same order. 
This would indicate that the choice of school in which enrolled is 
not greatly affected by the scholastic ability of the student. Slight 
variations in the consistency with which the scholarship group surpassed 
the non-scholarship group and the non-scholarship group surpassed 
the random selection group are probably caused by the small number 
of subjects in some of the divisions. 

5. The scholarship group showed decided superiority on all three 
of the standardized tests given to the subjects, and the non-scholarship 
group was consistently superior to the random selection group. 

6. In all three types of information secured concerning the 
subjects studied—high school achievement, success in college work, 
and achievement on certain standardized tests—the scholarship group 
was definitely superior to the non-scholarship group and the non- 
scholarship group in turn was superior to the random selection group. 
These facts indicate that the Scholarship Committee succeeds in selecting 
students who are worthy, from the point of view of ability to do 
college work, of being awarded county scholarships. 
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A List of Bulletins in the Field of Education, 
Indiana University 


By KATHLEEN DUGDALE 


The School of Education at Indiana University has published the 
following bulletins. All of these may be obtained from the University 
Bookstore for 50 cents postpaid unless otherwise noted. 


Proceedings of the High School Principals’ Conference (November 23 and 24, 
1923). Vol. I, No. 1, 1924. 85 p. (Supply exhausted.) 

Investigation of Nursing as a Professional Opportunity for Girls, Part I, 
Technical Study; Part II, Vocational Information Bulletin. By Florence E. Blazier. 
Vol. I, No. 2, 1924. 69 p. ; 

Proceedings of the Eleventh Conference on Educational Measurements. Vol. 
I, No. 3, 1925. 141 p. 


Proceedings of the High School Principals’ Conference (November 14 and 
15, 1924). Vol. I, No. 4, 1925. 49 p. (Supply exhausted.) 

First Revision of the Bibliography of Educational Measurements. Compiled 
by the Bureau of Cooperative Research. Vol. I, No. 5, 1925. 147 p. (Supply 
exhausted.) 

Proceedings of the Twelfth Conference on Educational Measurements. Vol. I, 
No. 6, 1925. 76 p. 


The Effect of Population upon Ability to Support Education. By Harold F. 
Clark. Vol. II, No. 1, 1925. 28 p. 

Proceedings of the High School Principals’ Conference (November 20 and 21, 
1925). Vol. II, No. 2, 1925. 77 p. (Supply exhausted.) 

A Cross-Indexed Bibliography on School Budgets. By Harold F. Clark. Vol. II, 
No. 3, 1926. 66 p. 

A Comparison of the Results Made on Certain Standardized Tests by Pupils 
in the Bloomington High School Who Were Taught in Classes of the Same 
Grade by University Student Teachers and by Regular High School Teachers. By 
Carl G. F. Franzén. Vol. II, No. 4, 1926. 19 p. 

Proceedings of the Thirteenth Annual Conference on Educational Measurements. 
Vol. II, No. 5, 1926. 103 p. 

When to Issue School Bonds. By Harold Florian Clark and Paul Royalty. Vol. 
II, No. 6, 1926. 16 p. 

Students’ Attitude Toward Examinations. By Grover T. Somers. Vol. III, 
No. 1, 1926. 48 p. 

Proceedings of the High School Principals’ Conference (November 12 and 13, 
1926). Vol. III, No. 2, 1926. 27 p. 

Index Numbers in School Administration. By Harold F. Clark. Vol. III, 
No. 8, 1927. 35 p. 

Topical Analysis of 234 School Surveys. Compiled by the Bureau of Cooperative 
Research. Vol. III, No. 4, 1927. 111 p. (Supply exhausted.) 

Proceedings of the Fourth Annual Conference on Elementary Supervision. Vol. 
III, No. 5, 1927. 64 p. 

Proceedings of the Fourteenth Annual Conference on Educational Measurements. 
Vol. III, No. 6, 1927. 66 p. 

Some Phases of the Junior College Movement. By I. Owen Foster, Harold F. 
Clark, Willard W. Patty, and Leo M. Chamberlain. Vol. IV, No. 1, 1927. 125 p. 
(Supply exhausted.) 


Second Revision of the Bibliography of Educational Measurements. By Henry 
Lester Smith and Wendell William Wright. Vol. IV, No. 2, 1927. 251 p. (75¢) 

Bibliography of School Buildings, Grounds, and Equipment, Part I. By Henry 
Lester Smith and Leo Martin Chamberlain. Vol. IV, No. 8, 1928. 326 p. (75¢) 

Proceedings of the High School Principals’ Conference (November 18 and 19, 
1927). Vol. IV, No. 4, 1928. 54 p. 

The Economic Effects of Education. By Harold F. Clark. Vol. IV, No. 5, 
1928. 39 p. 

Proceedings of the Fifteenth Annual Conference on Educational Measurements. 
Vol. IV, No. 6, 1928. 73 p. 

Proceedings of the Fifth Annual Conference on Elementary Supervision. Vol 
V, No. 1. 1928. 54 p. 
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Proceedings of the High School Principals’ Conference (November 16 and 17, 
1928). Vol. V, No. 2, 1928. 33 p. 


The Development and Use of a Composite Achievement Test. By Wendell 
William Wright. Vol. V, No. 3, 1929. 90 p. 


An Analysis of the Attitudes of American Educators and Others Toward a 
Program of Edugation for World Friendship and Understanding. By Henry Lester 
Smith and Leo Martin Chamberlain. Vol. V, No. 4, 1929. 109 p. 


Tentative Program for Teaching World Friendship and Understanding in 
Teacher Training Institutions and in Public Schools for Children Who Range 
from Six to Fourteen Years of Age. By Henry Lester Smith and Sherman Gideon 
Crayton. Vol. V, No. 5, 1929. 54 p. 

Proceedings of the Sixt-enth Annual Conference on Educational Measarements. 
Vol. V, No. 6, 1929. 96 p. 

Procecdingy of the Sixth Annual Conference on Elementary Supervision. Vol. 
VI, No. 1, 1929. 73 p. 

An Analysis of the Duties of County School Superintendents and Superin- 
tendents of Schools in Certain Cities in Indiana. By Henry Lester Smith and Leo 
Martin Chamberlain. Vol. VI, No. 2, 1929. 94 p. 


Proceedings of the High School Principals’ Conference (November 22 and 23, 
1929). Vol. VI, No. 3, 1930. 51 p. 

Cooperative Studies in Secondary Education. By Henry Lester Smith and Carl 
G. F. Franzén. Vol. VI, No. 4, 1930. 121 p. 

Proceedings of the Seventeenth Annual Conference on Educational M 
ments, Vol. VI, No. 5, 1930. 103 p. 


Proceedings of the Seventh Annual Conference on Elementary Supervision. 
Vol. VI, No. 6, 1930. 102 p. 


A Study in Teacher Supply and Demand in Indiana, By I. Owen Foster, 
Robert K. Devricks, Harry N. Fitch, Earl C. Bowman, and George L. Roberts. 
Vol. VII, No. 1, 1930. 177 p. 

Proceedings of the High School Principals’ Conference (November 7 and 8, 
1930). Vol. VII, No. 2, 1930. 70 p. 

The Philosophy of Human Relations: Individual and Collective—A Source Book. 
By Henry Lester Smith and Harold Littell. Vol. VII, No. 3, 1931. 326 p. (75¢) 
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